Skip to content

Conversation

@sjmonson
Copy link
Collaborator

Summary

Details

  • [ ]

Test Plan

Related Issues

  • Resolves #

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

tukwila and others added 2 commits September 22, 2025 14:47
Comment on lines 85 to 99
@SchedulerMessagingPydanticRegistry.register()
class ScheduledRequestAugmentation(StandardBaseModel):
"""
Adjustments to scheduler logic for a paired request.
"""

post_requeue_delay: float = Field(
description=(
"Delay in seconds to wait after a request to "
"queue the next request in the conversation."
),
default=0.0,
)


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a part of ScheduledRequestInfo but I figured it was not really necessary to pass it back and forth with each request update. Plus I thought this would be a good interface for adjusting the scheduling of individual requests.

Comment on lines 502 to 517
def _apply_history(
self,
request: GenerationRequest,
history: HistoryT[GenerationRequest, GenerationResponse],
) -> GenerationRequest:
"""
Apply conversation history to the current request.
"""

def turn_to_text(turn: tuple[GenerationRequest, GenerationResponse]) -> str:
req, res = turn
return f"{req.content}{res.value}"

request.content = "".join(chain(map(turn_to_text, history), (request.content,)))
return request

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary hack until we land request templates.

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Final pieces needed for image CI work. Fully enables auto `latest`,
`stable` tags and old image pruning.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Add `pipefail` to list-tags command to catch failures
- Add missing `ghcr.io/` to skopeo commands
- Disable dry-run option for development image cleanup job

## Test Plan

Ran with `workflow_dispatch` [see
here](https://github.com/vllm-project/guidellm/actions/runs/18108553536)

<img width="2032" height="955" alt="2025-09-29T15-45-39"
src="https://github.com/user-attachments/assets/b981ab01-fe90-4e15-bf60-cb483508065e"
/>
<img width="1204" height="579" alt="2025-09-29T15-46-02"
src="https://github.com/user-attachments/assets/68118168-2e80-4d45-92cc-47badc1caf16"
/>

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Samuel Monson <[email protected]>
@sjmonson sjmonson force-pushed the features/refactor/base-draft branch from 4c4ea5d to aa81de8 Compare September 30, 2025 15:19
@sjmonson sjmonson force-pushed the features/refactor/multiturn branch from 9ae0532 to cd43b2c Compare September 30, 2025 15:39
@sjmonson sjmonson marked this pull request as ready for review September 30, 2025 18:26
markurtz and others added 7 commits October 1, 2025 08:05
## Summary

It's inconvenient to look at metrics.

## Details

-


## Test Plan

- code launch

## Related Issues

- Resolves ##371

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
<img width="1757" height="1212" alt="image"
src="https://github.com/user-attachments/assets/fbfddeac-ca56-40c0-b7ae-d2f17d50823a"
/>


## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [ ]

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #

---

- [ ] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## Summary

With the default path referring to the versioned build now, users will
no longer experience their html reports breaking randomly when the build
files are updated.

Also fixed versioned build directory path issue that I missed previously

---------

Signed-off-by: dalthecow <[email protected]>
@sjmonson sjmonson changed the title [GuideLLM Refactor] Multi-Turn Rework [POST GuideLLM Refactor] Multi-Turn Rework Oct 2, 2025
@sjmonson sjmonson marked this pull request as draft October 2, 2025 20:41
DaltheCow and others added 12 commits October 3, 2025 10:35
## Summary

We want to use ITL instead of TPOT. The data we had previously happened
to be ITL data, but all of the labels indicate that it is TPOT data. Now
the code and labels reflect that it is ITL data.

## Test Plan

- Everything works, tests pass, No use of TPOT in the UI

---------

Signed-off-by: dalthecow <[email protected]>
Co-authored-by: Samuel Monson <[email protected]>
## TODO

- Docs
- ~CSV arg string support~ CSV arg string now supports single bucket
(see last example). Might leave it at that for now.
- More validation

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

This PR is a port of #287 to the v0.4.0 refactor branch.

Adds controls for sharing one or more fixed prefixes between samples.
See examples bellow.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->

Adds a `prefix_buckets` argument to the `SyntheticTextDatasetConfig`,
each bucket consists of a prefix count, token count, and bucket weight.
Prefix count sets the number of unique prefixes to generate for a given
bucket, token count is the length of each prompt in the bucket, and
bucket weight is used to calculate the proportion of requests the bucket
applies to relative to the sum of all bucket weights. Here are a few
examples:


Here we have one bucket of 32 prefixes of length 2048. Since there are
1024 total samples each prefix will apply to 32 samples. If there is
only one bucket than weight can be omitted as the bucket applies to 100%
of samples.

```yaml
data:
  prefix_buckets:
    - prefix_tokens: 2048
      prefix_count: 32
  prompt_tokens: 256
  output_tokens: 256
  samples: 1024
```

In this modified version of the first example 16 of the prompts have
2048 tokens while the other 16 have 1024 tokens.

```yaml
data:
  prefix_buckets:
    - prefix_tokens: 2048
      prefix_count: 16
      bucket_weight: 50
    - prefix_tokens: 1024
      prefix_count: 16
      bucket_weight: 50
  prompt_tokens: 256
  output_tokens: 256
  samples: 1024
```

The prefix tokens of a bucket can also be 0 to disable prefixes for
those samples. Here is an example where 40% of the samples have a prefix
of 2048 tokens while the other 60% have no prefix.

```yaml
data:
  prefix_buckets:
    - prefix_tokens: 2048
      bucket_weight: 40
    - prefix_tokens: 0
      bucket_weight: 60
  prompt_tokens: 256
  output_tokens: 256
  samples: 1000
```

If only a single bucket is needed, it can be set at the top level. This
make the changes backwards compatible with the previous interface and
allows the CSV string format to work without parsing nested structures
(at least for this use-case).

```yaml
data:
  prefix_tokens: 128
  prefix_count: 10
  prompt_tokens: 256
  output_tokens: 256
  samples: 1000
```

## Test Plan

<!--
List the steps needed to test this PR.
-->
- PR includes unit tests for all synthetic dataset changes (`pytest
tests/unit/dataset`)
- Scenearios in the Details section can be used against a model server
with prefix caching and the cache rate can be confirmed by inspecting
console output.

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #232
- Closes #287

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Samuel Monson <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Fix to parsing rc ref in CI

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
This is the same fix as #389 but applied to the RC workflow rather than
the release workflow as was the original intent with #389. Both
workflows need this change so not reverting the other one.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [ ]

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #

---

- [ ] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [ ]

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #

---

- [ ] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>
Many of the quality errors are due to using the older union style, and have appeared due to the upgrade of the minimum Python version from 3.9 to 3.10

Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->

Makes the `max_tokens` request key configurable through an environment
variable per endpoint type. Defaults to `max_tokens` for legacy
`completions` and `max_completion_tokens` for `chat/completions`

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a
dict mapping from route name -> output tokens key. Default is
`{"text_completions": "max_tokens", "chat_completions":
"max_completion_tokens"}`

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Closes #395
- Closes #269
- Related #210

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
@sjmonson sjmonson force-pushed the features/refactor/base-draft branch from aa81de8 to 48d1b95 Compare October 10, 2025 16:39
markurtz and others added 27 commits October 16, 2025 15:42
… package and CLI pathways (#414)

## Summary

Changed the benchmarking entrypoint to take in an Args object which is
now used to load scenarios. It enables a single source of truth in
addition to being able to save the exact configurations in the report
output.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [ ]

## Test Plan

<!--
List the steps needed to test this PR.
-->
-

## Related Issues

<!--
Link any relevant issues that this PR addresses.
-->
- Resolves #

---

- [ ] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
…odec (#411)

## TODO

- [ ] ~~More flexible version locking in multimodal extras group~~
- Goal with this was to add locking for different torchcodec/torch
versions but honestly its not worth the hassle
- [x] Check for multi-modal libs being installed
- [ ] More testing on `encode_audio`

## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Replaces audio processing libraries with `torchcodec` which eliminates
19 dependencies and brings us inline with what HuggingFace `datasets` is
doing.

## Details

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- 

## Test Plan

<!--
List the steps needed to test this PR.
-->
- Run against audio server with

```bash
guidellm benchmark run \
    --target http://localhost:8000 \
    --profile "synchronous" \
    --max-requests 20 \
    --request-type "audio_transcriptions" \
    --data "openslr/librispeech_asr" \
    --data-args '{"name": "clean", "split": "test"}'
```

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Adds a `tox` env for updating the lock file. Also allows args for mypy
env.


---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## Summary

Various type fixes with the goal of not breaking anything.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
## Summary

TODO

## Details

TODO

## Test Plan

TODO

## Related Issues

TODO
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Turns the `guidellm[multimodal]` extras group into `guidellm[audio]` and
`guidellm[vision]`.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Samuel Monson <[email protected]>
## Summary

<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Install all extras in the container and add `ffmpeg`.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
@sjmonson sjmonson force-pushed the features/refactor/multiturn branch from cd43b2c to 9669983 Compare October 20, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants